Biostatistics For Dummies, 2nd Edition (Monika Wahi, John Pezzullo)

CHAPTER 20 Getting the Hint from Epidemiologic Inference 293

Avoiding overloading

You may think that choosing what covariates belong in a regression model is easy.

You just put all the confounders and the exposure in as covariates and you’re done,

right? Well, unfortunately, it’s not that simple. Each time you add a covariate to a

regression model, you increase the amount of error in the model by some amount —

no matter what covariate you choose to add. Although there is no official maximum

to the number of covariates in a model, it is possible to add so many covariates that

the software cannot compute the model, causing an error. In a logistic regression

model as discussed in Chapter 18, each time you add a covariate, you increase the

overall likelihood of the model. In Chapter 17, which focuses on ordinary least-

squares regression, adding a covariate increases your sum of squares.

What this means is that you don’t want to add covariates to your model that just

increase error and don’t help with the overall goal of model fit. A good strategy is

to try to find the best collection of covariates that together deal with as much error

as possible. For example, think of it like roommates who share apartment-

cleaning duties. It’s best if they split up the apartment and each clean different

parts of it, rather than insisting on cleaning up the same rooms, which would be

a waste of time. The term parsimony refers to trying to include the fewest

covariates in your regression model that explain the most variation in the depen-

dent variable. The modeling approaches discussed in the next section explain

ways to develop such parsimonious models.

FIGURE 20-1:

Example of how

confounders are

associated with

exposure and

outcome but are

not on the causal

pathway between

exposure and

outcome.